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Abstract The asymptotic expansion of the distribution of the gradient test statistic is 
derived for a composite hypothesis under a sequence of Pitman alternative hypothe- 
ses converging to the null hypothesis at rate n -1 / 2 , n being the sample size. Com- 
parisons of the local powers of the gradient, likelihood ratio, Wald and score tests 
reveal no uniform superiority property. The power performance of all four criteria in 
one-parameter exponential family is examined. 
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1 Introduction 



The most commo nly used large sample tests are the likelih ood rat io (Wilksl ll938h 



Wald (Wal dll 19431) and Rao score dRaoll 19481) tests. Recently, Terrell (20021) proposed 
a new test statistic that shares the same first order asymptotic properties with the 
likelihood ratio (LR), Wald (W) and Rao score (Sr) statistics. The n ew s t atistic , 
referred to as the gradient statistic (St), is markedly simple. In fact. iRaol (2005) 



wrote: "The suggestion by Terrell is attractive as it is simple to compute. It would 
be of interest to investigate the performance of the [gradient] statistic." The present 
paper goes in this direction. 

Let x = (xi, . . . , x n ) T be a random vector of n independent observations with 
probability density function n(x \ 0) that depends on a p-dimensional vector of un- 
known parameters 6 = (9\, . . . , 6 p ) T . Consider the problem of testing the composite 
null hypothesis U : 6 2 = 2 o against Hi : 2 ^ 620, where 9 = (dj, 6j) T , 0\ = 
(61, ... , 9 q ) T and 62 = (Oq+i, • ■ • , 9 P ) J , 620 representing a (p — q)-dimensional 
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fixed vector. Let £ be the total log-likelihood function, i.e. I = l{9) = 1°§ ^(^z I 

0). Let 17(0) = dt/dO = (i7i(0) T , U 2 (9) T ) T be the corresponding total score 
function partitioned following the partition of 0. The restricted and unrestricted max- 
imum likelihood estimators of are — (Oj , 0j) T and = (6j, 9j ) T , respec- 
tively. 

The gradient statistic for testing Ho is 

S T = U(9) t (6 -0). (1) 

Since U\ (6) = 0, the gradient statistic in (fTJ can be written as St — U2(0) T (02 — 
620 )• Clearly, St has a very simple form and does not involve knowledge of the 
information matrix, neither expected nor observed, and no matrices, unlike W and 
Sr. Asymptotically, St has a ce ntral chi-square distribution with p — q degrees of 
freedom under Hn . iTerrelll (120021) points out that the gradient statistic "is not transpar- 
ently non-negative, even though it must be so asymptotically." His Theorem 2 implies 
that if the log-likelihood function is concave and is differentiable at 0, then St > 0. 

In this paper we derive the asymptotic distribution of the gradient statistic for 
a composite null hypothesis under a sequence of Pitman alternatives converging to 
the null hypothesis at a convergence rate n -1 / 2 . In other words, the sequence of 
alternative hypotheses is %\ n : O2 — 820 + »i -1 ' 2 e, where e = (e a +i , . . . , e D ) 



Similar results for the likelihoo d ratio and Wald tests were obtained by Havakawa 



( 1975) and for the score test, bv lHarris & Peers! dl980h . Comparison of local power 



properties of the competing tests will be performed. Our results will be specialized to 
the case of the one-parameter exponential family. A brief discussion closes the paper. 



2 Notation and preliminaries 



Our notation follows that of Havakawa (1975, 1977). We introduce the following 
log-likelihood derivatives 



d£ 



!Jr 



d 2 £ 

dd r dd s ' 



Vrst = n 3/2 



d 3 £ 



d6 r d6 s dO t 



their arrays y = (yi, . . . , y p ) T , Y = ((y rs )), Y.. = ((y rs t)), the corresponding cu- 
mulants K ra = E(y rs ), n r , s = E(y r y s ), n rst = n 1 / 2 ^^), K rM = n 1 / 2 E(y r y st ), 
K r , s ,t = n 1 / 2 E(y r y s y t ) and their arrays K = {{n r . s )), K... = {{n rst )), K. .. = 
((«r,st)) and K = ((/t r , s , t )). 

We make the same assumptions as in Havakawal dl975 ). In particular, it is as- 
sumed that the k's are all 0(1) and they are not functionally i ndependent; for in- 
stance, K r s = —n rs . Relations among them were first obtained bv lBartlettl(ll953allbh . 
Also, it is assumed that Y is non-singular and that K is positive definite with inverse 
K 1 = ((n r ' s )) say. For triple-suffix quantities we use the following summation 
notation 



K o aob o c 



v 

E 

r,s,i=l 



n rst a r b s c u K o M ob= ^ n riSt m rs b u 

r,s,t= 1 
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where M is a p x p matrix and a, b and c are p x 1 column vectors. 
The partition 6 = (6j , 0j) T induces the corresponding partitions: 



Y\i Y\ 2 
*21 Y22 



a = (aj , aJ) T , etc. Also, 



K 



K21 K22 



K 



■ K ll R 12 
R 21 K 22 



K2.. Q-2 b o c = K rst a r b s Ct- 

r — q+l s,t—l 

Using a procedure analogous to that of Havakawal ( 1975 ), we can write the asymp- 
totic expansion of St for the composite hypothesis up to order n~ x / 2 as 

S T = -(Zy + £) T Y(Zy + £)- -^=K... o (Zy + o Y x y o Y x y 



1 



2^ 

where Z = Y- 1 - Z , 



K o {Zy + £) o (Z Q y - £) o (Z y - £) + C^rT 1 ), 



Z n = 



1 





-^71 — 



being the identity matrix of order p — q. 



We can now use a multivariate Edgeworth Type A ser ies expansion of the joint 
density function of y and Y up to order n ~ x ' 2 dPeerslll97ll) . which has the form 



fi — fo 



where 



(K o K x y o K x y o K x y - AK o K 1 o ~ x y) 



K ,.oK- x yoD 



+ 0{n- x ), 



f = {2^l 2 \K\- x l 2 C ^{- l -y T K- x y\ f[ 5(y rs -n rs ), 

> r,s=l 

-D = (( dj,c)), dj,e = 8'(y bc - K bc )/S(y bc - n bc ), with <5(-) being the Dirac delta 
function dBracewelll 19991) . to obtain the moment generating function of St, M(t) 
say. 

From /1 and the asymptotic expansion of St up to order n~ 1//2 , we arrive, after 
long algebra, at 



M{t) = (1 - 2t)-*(*-«> exp( I -^e T l<: 22 . 1 e) 



-.(A 1 d + A 2 d 2 + A 3 d 3 



0(n- x ), 
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where d = 2t/(l - 2i), JC22.1 = -^22 - IfjiifnJfis, = ° *f 1 ° e* + 

4if.,.. o A o e* + K o A o e* + K o e* o e* o e*)/4, A 2 = -(-ftT... o K 1 o e* - 
K..'o A o e* - 2K. ... o e* o e* o e*j/4, A 3 = K o e* o e* o e*/12, 







A = 


[tfii 1 


















Whenn ->• 00, M(i) ->• (l-2t)-(P-«)/ 2 exp{2*A/(l-2i)}, where A = e T K 22 .ie/2, 
and hence the limiting distribution of SV is a non-central chi-square distribution 
with p — q degrees of freedom and non-centrality parameter A. Under Ho, i.e. when 
e = 0, M(t) = (1 - 2t)- ( - p - q '>/ 2 + 0(n _1 ) and, as expected, S T has a central 
chi-square distribution with p — q degrees of freedom up to an error of order n -1 . 
Also, from M(t) we may obtain the first three moments of St up to order rT x l 2 as 
Hi(St) =p-q + \ + 24i/Vn, H2{S T ) = 2(p-q + 2A) + 8(Ai + A 2 )/^/E and 
M 3 (5"t) = 8(p - g + 3A) + 6(Ai + 2A 2 + Asj/Vn. 



3 Main result 



The moment generating function of St in a neighborhood of 02 = #20 can be written, 
after some algebra, as 



where 



M(t) = (l-2t)-*(P-«>exp( T -^€ 

r 1 3 
x i + -=yv(i-2t) 



T -^22.1 e 



+ 0(n" 1 ), 



01 



= i{Kt o (Jf -1 )^ o (e*)t - (4K.„. + 3K..y o ^ o (e*)t 

- 2(Jjf... + 2K.,..) t o (e*)t o (e*)t o (e*) f 
-2(/f,.+if 2 ,..)to e o( e *)to( £ *)t}, 



(2) 



a 2 = --{Kio(if- 1 -A)to( e «)t 

-(X... +2K.,..)to( e *)to(e*)to( e *)t}, 

a3 = -^JSf.t.o(6*)+o(e*)+o(e*)t, 

and a = — (a\ + a 2 + a 3 ). The symbol "f" denotes evaluation at = (Oj , 0j ) T . 
Inverting M(t), we arrive at the following theorem, our main result. 

Theorem 1 The asymptotic expansion of the distribution of the gradient statistic for 
testing a composite hypothesis under a sequence of local alternatives converging to 
the null hypothesis at rate wT x l 2 is 

1 3 

Pr(S T <x)= G ft x(x) + a kGf+2kA*) + Oin- 1 ), (3) 

Vn fc=o 
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where G m ^\{x) is the cumulative distribution function of a non-central chi-square 
variate with m degrees of freedom and non-centrality parameter A. Here, f = p — q, 
A = e T i e /2 and the 's are given in (0. 

If q = 0, the null hypothesis is simple, e* = — e and A = 0. Therefore, an 
immediate consequence of Theorem[T]is the following corollary. 

Corollary 1 The asymptotic expansion of the distribution of the gradient statistic 
for testing a simple hypothesis under a sequence of local alternatives converging 
to the null hypothesis at rate n^ 1 / 2 is given by @ with / = p, A = e T 
a = K\ o e o e o e/6, a x = -{K\ o {K~ 1 )^ o e - 2K\ o e o e o e}/4, 
a 2 = {ift o(_K--i)t oe -(K.. + 2ii'. ) ..j'toeoeoe}/4an^a3 = '.fift oeoeoe/12. 



4 Power comparisons between the rival tests 

To first order St, LR, W and Sr have the same asymptotic distributional proper- 
ties under either the null or local alternative hypotheses. Up to an error of order rT 1 
the corresponding criteria have the same size but their powers differ in the n" 1 ^ 2 
term. The power performance of the different tests may then be compared based 
on the expansions of their power functions ignoring terms or order less than iiT 1 ! 2 . 
Harris & Peers! dl980t) presented a study of local power, up to order n 1 / 2 , for the 



likelihood ratio, Wald and score tests. They showed that none of the criteria is uni- 
formly better than the others. 

Let Si (i = 1, 2, 3, 4) be, respectively, the likelihood ratio, Wald, score and gradi- 
ent statistics. We can write their local powers as Hi = 1— Pr(Si < x) = Pr(S', > x), 
where 



1 3 

Pr(Sj < x) = G p - q> \(x) + —j= ^ a ikGp- q+2 kM x ) + °( n 1 ' 



k=0 

The coeffici ents that define th e local powers of the likelihood ratio and Wald tests 
are given in Havakawa dl975b. those corresponding to the score and gradient tests 
are given in Harris & Peersl (|l980) and in (O, respectively. All of them are compli- 
cated functions of joint cumulants of log-likelihood derivatives but we can draw the 
following general conclusions: 

- all the four tests are locally biased; 

- if K = 0, the likelihood ratio, Wald and gradient tests have identical local 
powers; 

- if K = 2K , the score and gradient tests have identical local powers. 

Further classificatio ns are possible for ap propr iate subspaces of the para meter space; 
see, for instance, lHarris & Peersl dl980b and lHavakawa & Puril dl985l) . Therefore, 
there is no uniform superiority of one test with respect to the others. Hence, the gradi- 
ent test, which is very simple to compute as pointed out by C.R. Rao, is an attractive 
alternative to the likelihood ratio, Wald and score tests. 
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5 One-parameter exponential family 

Let x — (xi , . . . , x n ) be a random sample of size n, with each xi having probability 
density function 7r(x; 9) — exp{t(x; 9)}, where 9 is a scalar parameter. To test Ho : 
9 = 9q, where 9q is a fixed known constant, the likelihood ratio, Wald, score and 
gradient statistics are, respectively, 

n 

Si = 2^{t(x l ;$)-t(x l ;6o)}, S 2 = n(9 - 9 ) 2 K(9), 
i=i 



(Er=i W 



S3 = X ^ l=1 nK ^" ■ S i = {9-9,)Jj>{x l -9 Q ), 

where 9 is the maximum likelihood estimator of 9 and K = K{9) denotes the Fisher 
information for a single observation. Under Ho all the four statistics have a central 
chi-square distribution with one degree of freedom asymptotically. 

Now, let Kee = E{t"{x; 9)}, K 6ee = E{t"'(x; 9)}, n ge ,e = E{t"(x; 9)t'(x; 9)}, 
K B ' e = —Kj}, etc, where primes denote derivatives with respect to 9; for instance 
t"(x; 9) = d 2 t(x; 9) / d9 2 . The asymptotic expansion of the distribution of the gradi- 
ent statistic for the null hypothesis Ho : = 9o under the sequence of local alterna- 
tives Hin :9 = 9 Q + n _1 / 2 e is given by © with / = 1, A = Kh 2 /2, 

J ^ k) (n e ' e )h-2K ji e 3 
a -—^—, a x - , 

KgggiK 6 ' )^ - (Kggg + 2Kg,gg)U 3 K les^ 
a 2 — -. , a 3 



12 

We now specialize to the case where ir(x; 9) belongs to the one-parameter expo- 
nential family. Let t(x; 9) = - \og({9) - a(9)d(x) + v(x), where a(-), ((■), d(-) 
and v(-) are known functions. Also, a(-) and ((■) are assumed to have first three con- 
tinuous derivatives, with £(•) > 0, a'(9) and (3' (9) being different from zero for all 
9 in the parameter space, where (3(9) = ('(9)/ {((9) a' {9)}. Since K = a'(9)l3'(9), 
£?=l*(si; °) = ~ n { lo sC(°) + a(0)d- v}, E?=i = -no/(B){P(6) + d}, 

with d = ElLi d{xi)/n and v = Ya=i v ( x l)/n, we have 



Si = 2n 



log{^}+{a(9 )-a(6)}d 



S2=n(9-9 ) 2 a'(9)f3'(9), 



na'(9 ){(3(9 ) + d} 2 - , - 

$3 = ~ai7a~\ ' = n{9 ~ 9)a {e ){P{9o) + d}. 

P \po) 

Let a' = a'(9), a" = a" (9), = (3'(9) and (3" = fi"{9). It can be shown that 
n ee = -a' /3', k bbb = -{2a" p + a'/3"), n BfiB = ot'p, n e ,e,e = a' '0" - a"/3'. The 
coefficients that define the local powers of the tests that use Si, S2, S3 and S4 are 

(2a" P' + a'/3")e 3 _ a"p'e 3 

Oio — 0*20 — 130 — — a 23 — 2<243 — 7, 1 a ll — X j 

2 
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_ (a'/3" - a"f3')e 3 _ a"P'( 3 (a' (3" - a"p')e 
a 12 -a 33 --a i0 - , 031 - — , 

a"p'e 3 (2a" fi' + a'j3")e {a 1 (3" - a"j3')e 
a 21 --a 22 - — — , a 32 - — , a 13 - 0, 

a"p'e 3 (2a" p + a'p")e a'/3"e 3 (2a" j3' + a'j3")e 

fl 41 — n 1 'a — TTTi ) °42 — -. -. — 77T, ■ 

2 Aa'fl' 4 ia'(3' 

If a(0) = 9, tt(x; 9) corresponds to a one-parameter natural exponential family. In 
this case, a 1 = 1, a" = and the as simplify considerably. 

We now present some analytical comparisons among the local powers of the four 
tests for a number of distributions within the one-parameter exponential family. Let 
Ili and Ilj be the power functions, up to order nT 1 / 2 , of the tests that use the statistics 
Si and Sj, respectively, with i 7^ j and i,j = 1,2, 3, 4. We have, 

1 3 

Ili - Ilj = —j=^(a jk - a ik )Gi +2k .\(x). (4) 

v ^ 7 — n 



It is well known that 



G m> x(x) - G m+2 ,\(x) = 2g m+ 2,\(x), (5) 

where g v ^\(x) is the probability density function of a non-central chi-square random 
variable with v degrees of freedom and non-centrality parameter A. From (|4]l and (Q, 
we can state the following comparison among the powers of the four tests. Here, we 
assume that 6 > 9^; opposite inequalities hold if 9 < 9^. 

1. Normal (9 > 0, —00 < fi < 00 and x € R): 

- 11 known: a(9) = (29)- 1 , £(0) = 9 1 / 2 , d(x) = (x - /1) 2 and v(x) = 

-{io g (27r)}/2, n i >n 3 >n 1 > n 2 . 

- 9 known: a(/i) = —(J>/9, C(^) = ex P{M 2 /(2#)}, d(x) — x and v(x) = 
-{x 2 + log(27rc?)}/2, i7x = n 2 = n 3 = 7T 4 . 

2. Inverse normal (6 > 0, /i > and x > 0): 

- n known: a(9) = 9, ((9) = 9- 1 ' 2 , d(x) = (x - y) 2 /(2ii 2 x) and v(x) = 

-{iog(27rx 3 )}/2, n 1 > n A > n 2 = n 3 . 

- 9 known: a(n) — 9/(2/i 2 ), ((fi) = exp{— 9//i)}, d(x) = x and v(x) — 

-{9/(2x) - \og(9/(2Trx 3 ))}/2, n i >n 3 >n 1 > n 2 . 

3. Gamma (k > 0, k known, 9 > and x > 0): a(9) = 9, ((9) = 9~ k , d(x) = x 
and v(x) = (k — 1) log(.T) — \og{r(k)}, r(-) is the gamma function, 11^ > 

n 1 >n 2 = n 3 . 

4. Truncated extreme value (9 > and x > 0): a(9) = 9~ l , C,(9) = 9, d(x) = 
exp(x) — 1 and v(x) = x, II4 > 1J 3 > H\ > 1J 2 . 

5. Pareto (9 > 0, k > 0, k known and x > k): a(9) = 1 + 9, ((9) = (9k 9 )- 1 , 
d(x) = log(x) and v(x) = 0, n 4 > TTi > 77 2 = 7T 3 . 

6. Laplace (9 > 0, -00 < k < 00, k known and x > 0): a(9) = 9~ x , ((9) = 29, 
d(x) = \x-k\ and v(x) = 0, 77 4 > 7T 3 > J7 X > 7T 2 . 

7. Power (9 > 0, > 0, cj> known and x > 0): a(9) = 1-9, ((9) = f?" 1 /, 
d(x) = log(x) and =0,II i > II 1 > 17 2 = 7T 3 . 
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6 Discussion 

The gradient test can be an interesting alternative to the classic large-sample tests, 
namely the likelihood ratio, Wald and Rao score tests. It is competitive with the other 
three tests since none is uniformly superior to the others in terms of second order local 
power as we showed. Unlike the Wald and the score statistics, the gradient statistic 
does not require to obtain, estimate or invert an information matrix, which can be an 
advantage in complex problems. 

Theorem 3 in iTerrelll (120021) points to another important feature of the gradient 
test. It suggests that we can, in general, improve the approximation of the distribution 
of the gradient statistic by a chi-square distribution under the null hypothesis by using 
a less biased estimator to 0. It is well known th at the maximum likelihood estimator 
can be bias-corrected using ICox&Snelll (fl968) results or the approach proposed by 



Firth ( 1993). The effect of replacing the maximum likelihood estimator by its bias- 



corrected versions will be studied in future research. Note that, unlike LR and Sr, 
the gradient statistic is not invariant under non-linear reparameterizations, as is the 
case of W. However, we can improve its performance, under the null hypothesis, by 
choosing a parameterization under which the maximum likelihood estimator is nearly 
unbiased. 

Our results are quite general, and can be specified to important classes of sta- 
tistical models, such as the generalised linear models. Local power comparisons 
of the three usual lar ge-s ample tests in gene ralised linear models are presented by 
Cordeiro et al. ( 1994 ) and Ferrari e t al. ( 1997). The extension of their studies to in- 



clude the gradient test will be reported elsewhere. 

As a final remark, the power comparisons performed in the present paper con- 
sider the four test s in their original form, i.e. they are not corrected to achieve local 



unbiasedness; see Rao & Mukeriee ( 1997) and references therein for this alternative 



approach. In fact, this approach can be explored in future work for the gradient test. 
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